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ABSTRACT 

Proposals for the use of the computer in the 
humanities often ask more of the machine than it can reasonably 
yield, and the enthusiastic generation of data for dictionary 
projects may well overburden the editors who must eventually cope 
with it. Procedures in lexicography are not well enough defined for a 
substantial burden to be placed on the logical capabilities of the 
computer. Most data collection must still be left in the hands of 
human readers, though editing of the data may be carried on with the 
use of on-line devices in which man interacts with machine. The use 
of the computer in the final stages of producing a dictionary, 
however, may yield important results in speeding production and in 
making available a reservoir of data for other purposes. This paper 
will be published as part of a ^'Festschrift ” for Hans Kurath later 
this year. (Author/FB) 
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Almost all humanistic scholars v/ho undertake \.'ork with the 
computer claim as a by-product of their research the clarification 
of fuzzy questions in their discipline. V/hether they are concerned 
with historical, literary, or linguistic matters, they predict that 
the crooked shall be made straight, the rough places plain. In the 
last decade a great variety of massive projects have been undertaken 
in the hope that mechanical data processing techniques can increase 
the scholar’s power to do his proper job and_ - eliminate the meticulous 
and unproductive sorting or copying of paper slips that occupies so 
much time in the work of the dialectologist (Shuy 1966) and the 
lexicographer (Bailey and kobinson 1970)* Some of these projects 

must be regarded as spectacular failures: some liave collapsed because 

0 

of a too sanguine estimate of the extent to v/hich the computer can 



share the scholar's burden; others through neglect of the careful 
planning characteristic of the early stages of the Linguistic Atlas of 
the United States and Canada and of the Middle English Dictionary . 

A severe appraisal of recent efforts in humanistic computing, will 
surely not be long in coming, but on this occasion we will pass over 
the shortcomings of this work. Nevertheless it should be noted that 
humanistic disciplines already oriented toward statistical or iterative 



methods have produced the most interesting results so far. At the 
moment, we have no really successful parsing systems — much less 
mechanical translators — and proposed systems for semantic analysis 
are very fair from turning out descriptions that are of interest to 
the linguist, the critic, or the lexicographer. The relation between 
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explicatnesG and traditional humanistic furcuits is a reciprocal one: 
objectification and clarification of intorostins questions worlcs both 
v/ays, both from the computer and from the discipline itself, .my 
major undertaking — such as a scholarly dictionary — must take 
into .account the present state of fie art. If the questions to be 
a=xed are already well-defined, some success may reasonably be expected. 



But if the questions are not clearly formulated, a much more cautious 
view is certainly called for as the humanist approaches the machine. 

While a general critical review of computer-based humanistic 
studies has yet to be carried out, the burgeoning projects in machine 
translation of the late fifties have been subjected to quite thorough 
scrutiny. In addition to the intellectual failures previously suggested 
many of these projects also failed to justify themselves on economic 



grounds, at least insofar as the goal of routine mechanical translation 
of documents was concerned. A committee of the National Academy of 
sciences appointed in 1964 declare^ itself 'puzzled by a rationale for 
spending substantial sums of money [some Sao million] on the mechanization 
of a small and already economically depressed industry with a full-time 
and part-time labor force of less than 5,000' (National Academy of 
.Sciences 1966:12). This committee recognized tliat many mechanical 
translation projects made contributions to general linguistic theory 
and to particular problems of language analysis; therefore it concluded 
that 'it is wise to press forward undaunted, in the name of science, 
but that the motive for doing so cannot sensibly be any forseeable 
improvement in practical translation' (24). Scholarly lexicography, 
while not obviously a depressed industry, must carefully weigh the 
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economic juotifioation for the necessarily large expenditures 
attendant on the use of computers. Both intellectual and economic 
considerations must assume subst.-aitial roles in the lexicographer's ■ 

tliinking as he begins to scrutinize the premise of technological 
aids to his craft. 

Operating under the guise of an apparently scientific method, 
lexicography is a notoriously ill-defined science.^ V/riting dictionary’ 
entries — the lexicographer »s central task — requires a polymath, 
sensitive to linguistic nuance and wise in all the lore that people 
using language talk about, ‘No editor, » as Ernst Leisi has recently 
pointed out, »lms ever produced a theory of definition nor, for that 
matter, been explicit about the methods that >iave led him to Iiis 
conclusions* (Leisi 196^:15). It would be a great mistake, we believe, 
to think of the computer as a potential ‘simulator* of a dictionary 
editor *s behavior. Nonetheless, there are tasks that can be usefully 

delegated to the machine to take some of the. harmful drudgery out of 
lexicography. 

Three major stages can be distinguished in the making of dictionaries 

1) collection of the data on which the dictionary will be based; 

2) preparation of the entries, including the choice of canonical 
forms, the writing of pronunciations, usage notes, definitions, and,. ‘ 
in the case of historical and unabridged dictionar.ies, the selection 
of illustrative examples from the citation file; and 3) the production 

of the finished work. What contribution can the computer make to each 
of these three stages? 
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1. '2Gtc?>bliohin."’ a file of uscir:e. Collecting citations for a dictionary 
lias traditionally been carried out by carefully trained readers -- an 
array of volunteers in the case of the OED and the period and regional 
descondvunts of tliat dictionary, or a substantial group of house 
readers like that maintained by the G. 8c C. Merriam Company in 
Springfield, For reasons that can be easily imagined, a large number 
of qualified and \d.lling volunteer readers is more difficult to recruit, 
today than it v/as in the last century, or even a generation ago when 

I 

the late Professor Charles C. Fries sent out the call for readers for 
his proposed Sirly Modern English Dictionary. Varying standards of 
transcription and accuracy vitiate the usefulness of collections made 

by independent scholars and much apparently valuable material had to 

1 

be ultimately abandoned by Professor Kurath in the early stages of 
the Middle English Dictionary ; similar difficulties also vexed V/alter 
S. Avis in his work on the Dictionary of Canadianisms (Avis 1969). 
Therefore it is no surprise that the possibility of using a computer to 
take over the laborious work of establishing a file of usage has 
come to mind v/herever new dictionary projects have been initiated. 

A thorough survey of those centers where the computer is now at 
work in establishing a file of citations for dictionary makers v/ould 
be a considerable undertaking, but we do want to suggest just how 
widespread this use of the computer is. At Nancy, the Centre de 
Recherche poiar un Tresor de la Langue Fran^aise employs thirty-eight 
full-time clerks in the transcription of texts for their citation file, 
a collection that will eventually contain excerpts from 250 million 
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v/ords of text (Centre National de la Recherche ocientifique I967). A 
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similarly substantial project employing the fullest use of automation 
is underway in Jerusalem in work for the proposed Historical Dictionary 
of the Kebrev/ Language (3en-Hayyim 1966), The computer likewise 



plays a prominent role in excerpting texts for a Dictionary of Middle 
nigh German (Raben 1969:296), a Historical Dictionary of the Italian 
L?inguage (Raben 196? : 79), and the Pic tionary of the Older Scottish 
Tongue in Edinburgh (Aitken and Pratley I966). Work is now beginning 
for a new Old English Dictionary at Toronto and Waterloo, Ontario, that 
v/ill make significant use. of the computer in collecting, and similar 
lexicological projects are being undertaken for several ^uechuniciiran. 
languages (V/blck I969), for Latin, Swedish, Dutch, Modern German, 
Serbo-Croatian, and Russian (see National Science Foundation I969 and 
Raben I969). Nearly all of these projects employ something like the 
concordance principle to prepare materials for the dictionary editor's 



hands, and in all cases the work is being undertaken with great vigor 
and undoubtedly great expense* 



Despite this widespread activity, we feel that there are some 
serious limitations to the use of the computer for establishing a 
citation file and that enthusiasm for the machine has sometimes 
overburdened the better judgment of the scholar* Professor Louis 
Milic of Cleveland State, for example, reports that he was- asked 
to become a volunteer reader for a proposed English dictionary of 
the Tudor period* Incredulous that he was asked to copy citations 
in longhand from texts in the period, he announces that he found 
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the editor’s request ’baffling.* ’So jarred was I,’ he says, ’that 
I actually began by acquiescing, just to see what sort of v/ork it w?is. 

I can report that it was tedious, exacting and repetitive — just the 
sort of thing that a computer does very well’ (Milic 1967;lH). Of 
course Professor Milic is a pioneer in literary and linguistic computing 
and he is thoroughly familiar with the pov/ers — and weaknesses — of- 
the machine for work of this sort. But our own experience in work 
on the Sarly Modern English Dictionary leads us to the conclusion 
that the com]puter does tms tedious, exacting, and repetitive v/ork 
all too well. The result of an unbridled automatic reading program 
for even a small number of texts v/ill simply inundate the editor 
v/ith material and postpone, rather than hasten, the production of 
a dictionary. 

>in excerpt from the dxrections for OSD readers suggests the 
complexity of a reasonable reading program for a dictionary. Much 

of the decision-making task is left to that most human of attributes, 
judgment: 



f 



o 



8 



[set in smaller typeface] 



Make a quotation for every word that strikes you as rare, 
obsolete} old'~fashxoned} new*} peculiar} or used in a peculiar way# 

T^ike special note of passages which show or imply that a v/ord 
is either new and tentative} or needing explanation as obsolete or 
arclvaic, and which thus help fix the date of its introduction or 
disuse# 

Make as many quotations as you c^ for ordinary words } especially 
when they are used significantly} and tend by the context to explain 
or suggest their own meaning# (OED I:xv) 
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It is quite obvious that no mechanical reading program can be devised 
to extract just those peculiar usages that editors want, or even to 
do a successful job at gathering 'defining quotations' for nor^l 
uses of ordinary vocabulary. Consider, for example, a use of the 
definite article that apparently arose in the Early Modern period 
and .isj.illustrated in the Om by a quotation from Nashe: "lo borrows • 

some lesser quarry of elocution from the Latine. " How could the 
computer be sensibly programmed to select this use of with 
the name of a language without increasing the total number of citations 
with yards of the? Lexicographers prize these special usages and 
the computer cannot approach the skill of even the most vdtless 
volunteer in collecting them unless some sort of oblique strategy 
is employed in searching for them. Human readers, as the experience 
of the editors of the Merriam-Uebster dictionaries has shown (Macdonald 
1962:171-72), are particularly adept at extracting novelties of -the 
kind required by the Og directions. Extracting citations from texts, 
.then, would seem to be a job in which cooperation between man and 
machine is required; human readers can be asked to identify peculiar 
usages while the computer can be expected to extract more representative 
examples of the use of vocabulary. Once a oompletfe concordance has 
been made, a simple routine can be devised to yield a random selection 

of frequently occurring .words to supplement a collection made by 
human readers. 

Mechanized reading programs, we suggest, can be allowed to bear the 
whole burden of citation collecting only in those oases where the total 
corpus for which the dictionary is responsible is quite limited. Old 
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iinslish 1g a Qood example of such a case and the tv;o editors of the 

new Old English Dictionary can build a citation collection based on 

virtually the whole body of Old English texts v/ithout making their 

editorial job unmanageable. For more recent periods of our language, 

a brute force approach in which all usages are isolated is impracticable. 

The expense of keyboarding more than 10,000 texts for Early Modern 

English (the OSD corpus for the period) can hardly be justified; 

much less can a lexicographer do more than sample the huge volume 

of printed language for which a dictionary of a more recent period 

1 

is responsible. 

Nevertheless, the potential for using the computer in building 
the citation file opens opportunities that were beyond the reach of 
earlier lexicographers. In our materials for the Early Modern English 
Dictionary, we have more than tv/o million slips obtained from volunteer 
readers that we believe to represent a considerable proportion of those 
special usages requested by the OED editors, half a million more 
result from "saturation reading" of some fifty texts selected for 
their particular linguistic significance, and of course nothing is so 
useful for saturation reading as the uncomplaining computer. Where 
scholars like William A. Elwood of Virginia are generous in allowing 
us to use Renaissance texts keypunched for other purposes, we can 
produce handsome slips ready for the file quickly and easily. The 
slips we have prepared using Elwood *s version of "The Defense of 
Poetry" cost about three and a half cents each, according to the current 
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accounting scheme used by the Computing Center at the University 
of Michigan. Nearly a decade ago, a commercial lexicographer estimated 
the cost of gathering slips by traditional means at more than thirty 
cents each and thanks to economic linf la t ion the present-day cost 
must be considerably higher (see Barnhart 1962:167)* As a result of 
the programming skills of Dr. Victor J. Streeter, our computer-produced 
slips match the format of man-made ones, providing on each five by 
eight slip a generous context for the selected word, full bibliographical 
information, and a note on the keypunching conventions for special 
characters. By. using a random number generator, w'e can extract a 
manageable selection of frequently occurring words rather than excluding 
them altogether as is usually the case with concordance projects. More 
extensive use of this computer system will help build a file that will 
yield valuable information for the editing process without making 
the collection too enormous for its intended use. 

2. Easing the editor *s burden. The editing process is intellectually 
the most interesting part of dictionary making and the most baffling. 
Though one might hope that the computer could at least provide the 
editor with a rough sort of the material in the file, in practice the 
difficulties are enormous. Even the job of separating grammatical 
homographs like abstract (noun, verb, and adjective) is a difficult 
one, and the task of separating the senses of polysemous words — 
helm of a ship from helm ^ a helmet — seems at the moment almost 
insuperable. In studies now underv/ay, we are exploring the possibility 
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of gathering; textual variants under their canonic^ 1 form, though 
without enormously increasing the complexity and expense of the program 
‘v/e cannot expect complete success in this effort. In supposing that 
the computer can assume the whole burden of grammatical and semantic 
analysis, we have once again expected too much of the computer and too 
little of ourselves. 

The most promising area of research in computational linguistics, • 
we believe, focuses on the interaction between man and machine. The 
computer is asked to do what it can do best, to manipulate the linguist *s 
data in useful but not overly complex v/ays. Essentially the machine 
rearranges the raw material into v/ord indices, phonetic or lexical 
concordances, or lists matching word with gloss in an annotated text. 

No work of any analytical interest is done by the computer, but the 
data is made more tidy and less forbidding to v;ork with (see Kay 1969)* 
Even more interesting applications are found in the area of hypothesis 
testing; systems are now in operation that accept proposed syntactic 
or phonological rules in a format familiar to the linguist, process 
the rules in a given order, and generate sequences that can be tested 
for acceptability against the native speaker *s intuition (see Friedman 
et al. forthcoming; Bobrow and Fraser 1968). Both of these applications 
come close to finding the ground where the best talents of man and 
machine can operate most effectively. 

Lexicographers are only beginning to make use of interactive 
. systems of this sort. At the University of Wisconsin, ilisystem^ hasobeea 
devised for the Dictionary of American Eegional English that allows the 



editor to manipulate the data stored on magnetic tape from a remote 
terminal, eliminating the time and energy-v/asting processes of hand 
filing* High costs and minor technical difficulties may 
yet prevent the use of this system in the actual editing process, but 
the DAHE editors have demonstrated the possibility of using computerized 
j^rocedures from tne optical scanning of prepared data through the 
final editing process (Venezky 1968)* Proofreading the recent 



.".meric an Heritage Die tionary was carried out with the aid of a cathode 
ray tube terminal which allowed editors to alter the dictionary 
entries in machine storage. This system came into play only after 
the main editorial work had been completed, but the use of a terminal 
of this sort in earlier stages of composition is certainly possible 
(fi^tlisher * s Weekly 1969)« The ultimate success of aill such systems, 
however, depends on the ability of thinking lexicographers to rationalize 
his tasks and his procedures. and to balance economies with results. 

In our work at Michigan, we have experimented with an interactive 



system for editing developed by Dr. V/alter Reitman (see HeitmcUi et al. 
1969)* Citation slips previously encoded appear on a small television 



screen and the editor writes a provisional definition for that usage. 

This definition automatically becomes a node on a defining tree and 
subsequent citations can be assigned to that node or to a new provisional 
definition written by the editor. When all the citations have been . 
examined, the editor can then arrange the clusters of citations in 
a way that leads to the heirarchy of senses in the published dictionary. 
This system is relatively cheap to operate and does not require ' 
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extraordinarily expensive hardv/are# iivcn less costly .■.systems are 
• promised shortly (Bitzer cind Slcaperdas 1969)| and the regular use of 
aids of this kind may soon become commonplace in lexicography and in 
the preparation of scholarly editions* 

3* Producing the finished copy* Once the material has been collected 

and the editorial process completed, the computer may also find a role 

in the production. of the finished dictionary through operation of 

computer-driven typesetting machines. Commercial lexicographers have 

a decade and only 

been anticipating this development for '/ ; last minute difficulties 

A 

prevented the production of the fiandom House Dictionary of the English 
Language by this means (see Urdang 1966). The recent development of 
cin information system capable of handling typographical complexities 
made it possible to produce tjie handsome American Heritage Dictionary 
' by such means, and cheaper systems will soon make the benefits of 
these schemes available to the scholarly lexicographer as well. The 
V.'ycliffe Bible Translators in Mexico City have, shown the feasibility of 
producing testaments in a great variety of special typefaces and 
styles, and similar systems could be used to handle the complex format 
that v/e have come to expect in our English dictionaries. 

Economy is not the only benefit to be anticipebed from computer 
production. Standards of textual accuracy can be raised considerably 
by designiiig routines to check the editor *s finished product; cross- 
referencing, for example, could be carefully scrutinized, thus saving 
the editor ffom one source of animadversion by sharp-eyed reviewers. 
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Should the final sta^^eG of dictionary-makin"' be mechanized’, it 
v'ould not be difficult to produce abridj^’cinenbG of various sorts to 
fill the demands of students as well as scholars. But even more 
important, the computer-encoded dictionary would allow regular revision 
and alteration as nev/ evidence comes to light. As a by-product of 
lexicography, the tape or disc containing the dictionary might 
eventually play an important role in an enormous question-answering 
system of the kind nov/ envisaged by many scholars in information . 
science. Though these techniques and applications are not of great 
interest to the lexicographer, they do suggest that lexicography, 
as well as other applications of humanistic computing, may eventually 
come to play an important part in the text-editing schemes, translators 
and information systems of the future. 

Other forms of modern technology may soon have an impact on the 
lexicographer * s work. Scholarly dictionaries are notoriously slow 
projects; of the six dictionaries' that were begun in response to 
Craigie*s call for regional and period dictionaries in 1919 * only one . 
is now completed (the Dictionary of American English ) and more than 
a quarter century of work finds the Dictionary of the Older Scottish 
Tongue, the Scottish National Dictionary, and the Middle JInglish • 
Dictionary still far from completion. The scholarship that makes 
use of these works cannot stand still and some interim means is 
needed for disseminating the lexical information now locked in their 
files. Microform technolcgy offers the possibility of making the 



citation files for these dictionaries available at Major research 
centers, not only as an aid to literary scholars but also for use 
by lexicographers interested in other periods of the language. 
Furthermore, the availability of vocabulary citations on microfilm 
would offer the scholar interested in particular usages more information 
than a selection from the file in a published dictionary could possibly 
give him. The cost and size of a scholarly .dictionary might ailso be 
reduced by a system of reference keys between the dictionary and the 
microfilm collection, an innovation that might well hasten the 
production of such works. 

Rising costs and new styles of scholarship have combined to 
inhibit the initiation of new lexicographical projects. Yet recent 
years have seen the production of two new historical dictionaries for 
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iMiglish, the Dicti onary of Ganadianisms and the Dictionary of Jamaican 
En_glish . Similar undertakings are afoot in Australia and New Zealand, 
and A. W. Read shortly expects to begin editing his Dictionary of 
Briticisms, a project first announced in 1938. R. W. Burchfield's 
supplementary volume to the OED is now in production, while other 
projects have been tentatively outlined (see Crystal 196? and Orszagh 
1967 ). Technology derived from the computer and from micromethods 
can help speed the completion of such important accounts of regional 
and period English. The impact of technology \dLll be greater, however, 
and more significant if British and American dictionary makers will 
follow the example of their continental colleagues, as Sledd suggests. 
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in treating teclinological innovation as merely a catalyst for a 
. general examination of traditional practices* The computer is only 

a tool, but like any tool it can be used to overcome inherent deficiencies 
and extend limited powers. 
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on Computer Research in Language and Literature of the Midwest 
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and its relation to contemporary theories of language. In addition 
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generative grammar, works by the following authors (listed at the 
end of this essay) are of interest in considering the practical 
problems of dictionary making from a linguistic point of view; 
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